Airbnb Data Analytics Project - Shiny App

Marvin Martin & Aflak Michel Omar (ING5 BDA Gr01A)

26/11/2020

Introduction

Data Extraction

Airbnb daily data is very valuable. An investor is eager to use this data to make key decisions about the best real estate option available to generate benefit. In this project, we will use scrapped data from 6 countries aggregated over a period of time (For each city of these countries we kept the 3 latest dates collected):

These datasets can be downloaded on this website http://insideairbnb.com/get-the-data.html. A csv document is available in data\all_data_urls.csv where all the scrapped urls are available and ready to download.

Prepocessing overview

Because these datasets are huge, we made some processing to focus on important information and at the same time use a reasonable amount of data (fit computation and time limitations). We’ve been through several steps:

################### Code From utils/tools.R ##################################
urls <- read.csv(file.path("./data/all_data_urls.csv")) # Step 1 
df <- extract_all_meta(urls) # Step 2 
lastest_dates <- 3 # Step 3
countries <- c("france", "spain", "the-netherlands", "germany", "belgium","italy") # Step 4
download_data(df, countries, lastest_dates) # Step 5
listings <- load_global_listings() # Step 6

We reduce the data size from several Gb to only a hundred of Mb. We are now ready to play with it!

Prepocessing Steps

Starting with raw data, we’ve been through several steps:

[Step 1] Load csv data with urls and meta provided (read.csv)
[Step 2] Extract “country”, “region”, “city”, “date” and “url” from the csv in a dataframe (extract_all_meta)
[Step 3] Specify the number “n” of latest scrapping dates you are looking for.
[Step 4] Select a list of 6 countries, on which you want to work on.
[Step 5] Go through this dataframe, line by line and do the following steps (download_data and prepare_data) :

[Step 5 - Remarque] This step results in a big csv file for every cities of the countries listed.
We could have written the csv’s into files, but since this step takes more than 10 minutes, we preferred to keep them in memory.

[Step 6] Get Final preprocessed dataset by merging all the cities csv into a single data frame (load_global_listings).
This step is performed when the server starts and takes around 20 seconds.

Data Overview

Dataset sample:

Here is the shape of our dataset:
# Publications :1321825
# Features :21

Feature names are:

## - id
## - country
## - region
## - city
## - date
## - neighbourhood_cleansed
## - latitude
## - longitude
## - property_type
## - room_type
## - accommodates
## - bedrooms
## - beds
## - price
## - minimum_nights
## - maximum_nights
## - review_scores_rating
## - availability_30
## - price_30
## - revenue_30
## - latitudelongitude

Shiny App: Tab1 - Analysis by comparing several cities

Tab 1 - Analysis by comparing several cities

Shiny App: Tab2 - Analysis only one city

Tab 2 - Analysis only one city

Shiny App: Structure and code overview

Libraries

We used several libraries (webapp, graphical, data manipulation) to build this project:
shiny, googleVis, ggplot2, dplyr, data.table, stringr and glue

UI

################### Code From shinyApp/ui.R ##################################
# IT IS SPEUDO CODE !!!
fluidPage
  tabsetPanel
    tabPanel # Analysis 1 Tab
      sidebarLayout
        sidebarPanel # Tool Bar
          Checkbox, selectInput, uiOutput, ...
        mainPanel # Plots
          htmlOutput, plotOutput ...
    tabPanel # Analysis 2 Tab
      sidebarLayout
        sidebarPanel # Tool Bar
          Checkbox, selectInput, uiOutput, ...
        mainPanel # Plots
          htmlOutput, plotOutput ...

Server

################### Code From shinyApp/server.R ##################################
# IT IS SPEUDO CODE !!!
listings <- load_global_listings() # Download data
# Server
server
  # Tab 1 variables
  reactive # Reactive DataFrame (filter by country / cities / features)
  renderUI  # ui send from server to uiOutput (checkbox, selectInput, dateSlider)
  renderGvis, renderPlot # Plots send to ui send from server to htmlOutput,plotOutput (histogram,...)
  # Tab 2 variables
  reactive # Reactive DataFrame (filter one city)
  renderUI  # ui send from server to uiOutput (checkbox, selectInput, dateSlider)
  renderGvis, renderPlot # Plots  send to ui from server to htmlOutput,plotOutput (map,...)

App Usage: Analysis 1 Comparing Cities

Each tab is split into two vertical parts: Tool Bar and Plots

Tool Bar

You can:

Plots

App Usage: Analysis 2 Deep Dive in one City

Tool Bar

You can:

Plots

Let’s try this in our Shiny App

library(shiny) # You might need to install more packages (ggplot, gvis, ...)

setwd("~/YOUR_PATH/Airbnb-Analysis-ShinyApp")
runApp(appDir = "shinyApp")

#or 
setwd("~/YOUR_PATH/Airbnb-Analysis-ShinyApp")
runGitHub("Airbnb-Analysis-ShinyApp", "MarvinMartin24", subdir = "shinyApp")